Automatic generation of context-dependent pronunciations
نویسندگان
چکیده
We describe experiments in modelling the dynamics of fluent speech in which word pronunciations are modified by neighbouring context. Based on all-phone decoding of large volumes of training data, we automatically derive new word pronunciation, and context-dependent transformation rules for phone sequences. In contrast to existing techniques, the rules can be applied even to words not in the training set, and across word boundaries, thus modelling context-dependent behavior. We use the technique on the Wall Street Journal (WSJ) training data and apply the new pronunciations and rules to WSJ and broadcast news tests. The changes correct a significant portion of the errors they could potentially correct. But the transformations introduce a comparable number of new errors, indicating that perhaps stronger constraints on the application of such rules are needed.
منابع مشابه
Automatic modeling of pronunciation variations
We report on an automatic method for discovering an appropriate model topology for each context-dependent phoneme, allowing for such phenomena as reduced pronunciations and substituted phonemes. The method leads to a reduction in the word error rate on both the Wall Street Journal and Broadcast News databases.
متن کاملPronunciation lexicon adaptation for TTS voice building
This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker’s variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of our TTS voice d...
متن کاملAutomatic generation of multiple pronunciations based on neural networks
We propose a method for automatically generating a pronunciation dictionary based on a pronunciation neural network that can predict plausible pronunciations (alternative pronunciations) from the canonical pronunciation. This method can generate multiple forms of alternative pronunciations using the pronunciation network. For generating a sophisticated alternative pronunciation dictionary, two ...
متن کاملImproving TTS by higher agreement between predicted versus observed pronunciations
This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agreement between frontend and speech database. We focused, in particular, on two classes of problems causing degradation in synthesis quality: 1) realization of /d/ and /t/1 sounds and 2) confusions of unstressed vowels, especially with schwas. We investigated two approaches to tackling these problems. ...
متن کاملMulti-level decision trees for static and dynamic pronunciation models
We have been focusing on improving pronunciation models for automatic transcription of television and radio news reports by modeling phone, syllable, and word pronunciation distributions with decision trees. These models were employed in two separate sets of experiments. First, decision trees facilitated selection of word pronunciations derived automatically from data for use in a standard spee...
متن کامل